Model Strategy — The Ford Estate

Updated: 2026-03-01

Philosophy

Never pay for what you can get free. Never use a sledgehammer when a scalpel works.

The goal: route every request to the cheapest/fastest model that can handle it, spread usage across providers to avoid hitting limits, and reserve paid models for tasks that genuinely need them.

Provider Tiers (by cost)

Tier 0 — FREE (use first, always)

Provider	Best Models	Limits	Best For
NVIDIA NIM	GLM 5, Kimi K2.5, DeepSeek V3.2, Qwen3 Coder 480B	40 RPM	Everything — S+ tier models for free
Cerebras	GPT-OSS-120B, Qwen3-235B, GLM-4.7	30 RPM, 1M TPD	Speed-critical tasks (fastest inference)
Groq	Kimi K2, Llama 3.3 70B, Qwen3-32B	1K RPD, 12K TPM	Quick lookups, triage
OpenRouter (free)	29 free models	50 req/day	Overflow/fallback
Codestral	codestral-latest	30 RPM, 2K RPD	Code generation/completion
SambaNova	Llama 3.3 70B, DeepSeek V3	Free tier	Fast inference fallback

Tier 1 — CHEAP (pennies per task)

Provider	Best Models	Cost	Best For
Gemini Flash	gemini-2.5-flash	~$0.075/M input	Daily driver, coordination
Mistral Small	mistral-small-latest	Free (4M/month cap)	European routing, fallback
Together AI	Various open models	$25 credit	Batch work, coding

Tier 2 — MODERATE (use thoughtfully)

Provider	Best Models	Cost	Best For
OpenAI GPT-4o-mini	gpt-4o-mini	~$0.15/M input	When OpenAI quality needed
Mistral Medium	mistral-medium-latest	Moderate	Strong all-rounder
Gemini Pro	gemini-2.5-pro	~$1.25/M input	Long context, multimodal

Tier 3 — PREMIUM (last resort for complex tasks)

Provider	Best Models	Cost	Best For
OpenAI GPT-4o	gpt-4o	~$2.50/M input	When you need OpenAI flagship
OpenAI o3	o3	~$10/M input	Deep reasoning
Anthropic Opus	claude-opus-4-6	~$15/M input	Most complex reasoning, long agentic

Agent Assignments

Ada (Coordinator) — needs speed + awareness

Primary: NVIDIA NIM kimi-k2.5 (free, S+ tier, fast)
Fallback 1: Cerebras gpt-oss-120b (free, fastest inference)
Fallback 2: google/gemini-2.5-flash (cheap, reliable)
Complex tasks: Spawn Opus subagents (don't run Opus in main session)

K2 (Tech/Homelab) — needs coding + technical accuracy

Primary: NVIDIA NIM deepseek-v3.2 or Codestral (free, top coding)
Fallback 1: Cerebras qwen-3-235b (free, strong reasoning)
Fallback 2: google/gemini-2.5-flash
Complex infra: Spawn Opus subagent

Cora (Real Estate) — needs clarity + professionalism

Primary: NVIDIA NIM kimi-k2.5 (free, excellent writing)
Fallback 1: google/gemini-2.5-flash
Fallback 2: mistral/mistral-medium-latest

Winston (Family Butler) — needs warmth + reliability

Primary: google/gemini-2.5-flash (cheap, warm tone)
Fallback 1: NVIDIA NIM kimi-k2.5
Fallback 2: Groq llama-3.3-70b-versatile (fast for reminders)

Synergy (Wife's Assistant) — needs warmth + privacy

Primary: google/gemini-2.5-flash
Fallback 1: NVIDIA NIM kimi-k2.5
Fallback 2: Groq llama-3.3-70b-versatile

Subagents / Cron / Overnight

Cheap work: Cerebras or Groq (free)
Coding tasks: Codestral or NVIDIA NIM qwen3-coder-480b
Research: google/gemini-2.5-flash or NVIDIA NIM
Complex analysis: anthropic/claude-opus-4-6 (sparingly)

Load Balancing Strategy

Daily Token Budget (rough targets)

Free providers first: ~80% of daily requests
Gemini Flash: ~15% (overflow from free tier limits)
Premium (Opus/GPT-4o/o3): ~5% (complex only)

Limit Avoidance

Rotate across free providers: NIM → Cerebras → Groq → SambaNova → OpenRouter
When one hits rate limit, failover to next
Track daily usage per provider
Groq has lowest TPM (12K) — use for short exchanges only
Cerebras 1M TPD is generous — prefer it for longer contexts
NIM 40 RPM is generous — can be the default

When to Escalate

Simple Q&A, routing, reminders → Free tier (NIM/Cerebras/Groq)
Summarization, writing, analysis → Gemini Flash
Multi-step reasoning, planning → Gemini Pro or spawn Opus subagent
Code review, complex debugging → Codestral or Opus subagent
Never run Opus for simple tasks. Ever.

Pending Activations

DeepSeek: $0 balance (top up for cheapest reasoning: $0.14/M)
xAI: No credits (buy for Grok 3 access)
Z.AI: Depleted (top up for GLM-5 direct access)
DashScope: Models not activated (enable in console)
HuggingFace: Token valid, endpoint routing needs config

🕸️ Ada Research Browser